Pandas.的query函数为我们提供了一种编写查询过滤条件更简单的方法,特别是在的查询条件很多的时候,本文整理了几个常用的查询方法及实例。
首先生成一份虚拟数据集:
import numpy as np
import pandas as pd
np.random.seed(1)
n = 20
ID = np.arange(1,n+1)
SCORE = np.random.normal(80,10,n).astype('int')
SEX = ['male','famele']*10
AGE = np.random.randint(15,17,n)
df = pd.DataFrame({'ID':ID,'SEX':SEX,'AGE':AGE,'SCORE':SCORE})
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
0 | 1 | male | 15 | 96 |
1 | 2 | famele | 15 | 73 |
2 | 3 | male | 16 | 74 |
3 | 4 | famele | 16 | 69 |
4 | 5 | male | 15 | 88 |
5 | 6 | famele | 15 | 56 |
6 | 7 | male | 15 | 97 |
7 | 8 | famele | 15 | 72 |
8 | 9 | male | 16 | 83 |
9 | 10 | famele | 16 | 77 |
10 | 11 | male | 16 | 94 |
11 | 12 | famele | 15 | 59 |
12 | 13 | male | 16 | 76 |
13 | 14 | famele | 15 | 76 |
14 | 15 | male | 15 | 91 |
15 | 16 | famele | 16 | 69 |
16 | 17 | male | 16 | 78 |
17 | 18 | famele | 15 | 71 |
18 | 19 | male | 16 | 80 |
19 | 20 | famele | 16 | 85 |
在单个条件下进行过滤时,在Query函数中表达式仅包含一个条件。返回的输出将包含该表达式为真的所有行。例如选择所有SCORE超过90的样本
df.query('SCORE>=90')
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
0 | 1 | male | 15 | 96 |
6 | 7 | male | 15 | 97 |
10 | 11 | male | 16 | 94 |
14 | 15 | male | 15 | 91 |
一个或多个条件下过滤,query()的语法都保持不变,但是需要指定两个或多个条件进行过滤的方式
and
:返回在满足两个条件的所有记录or
:返回满足任意条件的所有记录例如查询 SCORE>80
且AGE>15
的样本:
df.query("SCORE>80 and AGE >15")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
8 | 9 | male | 16 | 83 |
10 | 11 | male | 16 | 94 |
19 | 20 | famele | 16 | 85 |
但是要注意如果是查询文本
相关的条件,需要将整个query
条件用双引号""
括起来,内部的字符串值使用单引号''
括起来,如例如查询 SCORE>80
且AGE>15
,SEX
为male
的样本:
df.query("SCORE>80 and AGE >15 and SEX=='male'")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
8 | 9 | male | 16 | 83 |
10 | 11 | male | 16 | 94 |
可以使用括号()
提升判断的先后顺序。
数学操作可以是列中的加,减,乘,除,甚至是列中值或者平方等,比如提取SCORE
的平法加100小于4900的数据:
df.query("SCORE**2+100<4900")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
3 | 4 | famele | 16 | 69 |
5 | 6 | famele | 15 | 56 |
11 | 12 | famele | 15 | 59 |
15 | 16 | famele | 16 | 69 |
Python内置函数,例如sqr,abs,exp等,也可以在查询表达式中使用,如计算SCORE
值在50-70之间的数据:
df.query("abs(SCORE-60)<10")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
3 | 4 | famele | 16 | 69 |
5 | 6 | famele | 15 | 56 |
11 | 12 | famele | 15 | 59 |
15 | 16 | famele | 16 | 69 |
参考资料
import numpy as np
import pandas as pd
np.random.seed(1)
n = 20
ID = np.arange(1,n+1)
SCORE = np.random.normal(80,10,n).astype('int')
SEX = ['male','famele']*10
AGE = np.random.randint(15,17,n)
df = pd.DataFrame({'ID':ID,'SEX':SEX,'AGE':AGE,'SCORE':SCORE})
print(df.to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:-------|------:|--------:| | 0 | 1 | male | 15 | 96 | | 1 | 2 | famele | 15 | 73 | | 2 | 3 | male | 16 | 74 | | 3 | 4 | famele | 16 | 69 | | 4 | 5 | male | 15 | 88 | | 5 | 6 | famele | 15 | 56 | | 6 | 7 | male | 15 | 97 | | 7 | 8 | famele | 15 | 72 | | 8 | 9 | male | 16 | 83 | | 9 | 10 | famele | 16 | 77 | | 10 | 11 | male | 16 | 94 | | 11 | 12 | famele | 15 | 59 | | 12 | 13 | male | 16 | 76 | | 13 | 14 | famele | 15 | 76 | | 14 | 15 | male | 15 | 91 | | 15 | 16 | famele | 16 | 69 | | 16 | 17 | male | 16 | 78 | | 17 | 18 | famele | 15 | 71 | | 18 | 19 | male | 16 | 80 | | 19 | 20 | famele | 16 | 85 |
df.query('SCORE>=90')
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
0 | 1 | male | 15 | 96 |
6 | 7 | male | 15 | 97 |
10 | 11 | male | 16 | 94 |
14 | 15 | male | 15 | 91 |
print(df.query('SCORE>=90').to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:------|------:|--------:| | 0 | 1 | male | 15 | 96 | | 6 | 7 | male | 15 | 97 | | 10 | 11 | male | 16 | 94 | | 14 | 15 | male | 15 | 91 |
df.query("SCORE>80 and AGE >15")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
8 | 9 | male | 16 | 83 |
10 | 11 | male | 16 | 94 |
19 | 20 | famele | 16 | 85 |
print(df.query("SCORE>80 and AGE >15").to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:-------|------:|--------:| | 8 | 9 | male | 16 | 83 | | 10 | 11 | male | 16 | 94 | | 19 | 20 | famele | 16 | 85 |
df.query("SCORE>80 and AGE >15 and SEX=='male'")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
8 | 9 | male | 16 | 83 |
10 | 11 | male | 16 | 94 |
print(df.query("SCORE>80 and AGE >15 and SEX=='male'").to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:------|------:|--------:| | 8 | 9 | male | 16 | 83 | | 10 | 11 | male | 16 | 94 |
df.query("SCORE**2+100<4900")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
3 | 4 | famele | 16 | 69 |
5 | 6 | famele | 15 | 56 |
11 | 12 | famele | 15 | 59 |
15 | 16 | famele | 16 | 69 |
print(df.query("SCORE**2+100<4900").to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:-------|------:|--------:| | 3 | 4 | famele | 16 | 69 | | 5 | 6 | famele | 15 | 56 | | 11 | 12 | famele | 15 | 59 | | 15 | 16 | famele | 16 | 69 |
df.query("abs(SCORE-60)<10")
ID | SEX | AGE | SCORE | |
---|---|---|---|---|
3 | 4 | famele | 16 | 69 |
5 | 6 | famele | 15 | 56 |
11 | 12 | famele | 15 | 59 |
15 | 16 | famele | 16 | 69 |
print(df.query("abs(SCORE-60)<10").to_markdown())
| | ID | SEX | AGE | SCORE | |---:|-----:|:-------|------:|--------:| | 3 | 4 | famele | 16 | 69 | | 5 | 6 | famele | 15 | 56 | | 11 | 12 | famele | 15 | 59 | | 15 | 16 | famele | 16 | 69 |